Search CORE

31 research outputs found

Transformer and Snowball Graph Convolution Learning for Biomedical Graph Classification

Author: Dong Shoubin
Hu Jinlong
Huang Yangmin
Publication venue
Publication date: 28/03/2023
Field of study

Graph or network has been widely used for describing and modeling complex systems in biomedicine. Deep learning methods, especially graph neural networks (GNNs), have been developed to learn and predict with such structured data. In this paper, we proposed a novel transformer and snowball encoding networks (TSEN) for biomedical graph classification, which introduced transformer architecture with graph snowball connection into GNNs for learning whole-graph representation. TSEN combined graph snowball connection with graph transformer by snowball encoding layers, which enhanced the power to capture multi-scale information and global patterns to learn the whole-graph features. On the other hand, TSEN also used snowball graph convolution as position embedding in transformer structure, which was a simple yet effective method for capturing local patterns naturally. Results of experiments using four graph classification datasets demonstrated that TSEN outperformed the state-of-the-art typical GNN models and the graph-transformer based GNN models.Comment: Prepared for submitting to TB

arXiv.org e-Print Archive

A multiqueue interlacing peak scheduling method based on tasks’ classification in cloud computing

Author: Dong Shoubin
Han Guangjie
Shu Lei
Zhu Chunsheng
Zuo Liyun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/04/2016
Field of study

In cloud computing, resources are dynamic, and the demands placed on the resources allocated to a particular task are diverse. These factors could lead to load imbalances, which affect scheduling efficiency and resource utilization. A scheduling method called interlacing peak is proposed. First, the resource load information, such as CPU, I/O, and memory usage, is periodically collected and updated, and the task information regarding CPU, I/O, and memory is collected. Second, resources are sorted into three queues according to the loads of the CPU, I/O, and memory: CPU intensive, I/O intensive, and memory intensive, according to their demands for resources. Finally, once the tasks have been scheduled, they need to interlace the resource load peak. Some types of tasks need to be matched with the resources whose loads correspond to a lighter types of tasks. In other words, CPU-intensive tasks should be matched with resources with low CPU utilization; I/O-intensive tasks should be matched with resources with shorter I/O wait times; and memory-intensive tasks should be matched with resources that have low memory usage. The effectiveness of this method is proved from the theoretical point of view. It has also been proven to be less complex in regard to time and place. Four experiments were designed to verify the performance of this method. Experiments leverage four metrics: 1) average response time; 2) load balancing; 3) deadline violation rates; and 4) resource utilization. The experimental results show that this method can balance loads and improve the effects of resource allocation and utilization effectively. This is especially true when resources are limited. In this way, many tasks will compete for the same resources. However, this method shows advantage over other similar standard algorithms

University of Lincoln Institutional Repository

Crossref

A multi-objective optimization scheduling method based on the ant colony algorithm in cloud computing

Author: Dong Shoubin
Hara Takahiro
Shu Lei
Zhu Chunsheng
Zuo Liyun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Abstract: For task-scheduling problems in cloud computing, a multi-objective optimization method is proposed here. First, with an aim toward the biodiversity of resources and tasks in cloud computing, we propose a resource cost model that defines the demand of tasks on resources with more details. This model reflects the relationship between the user's resource costs and the budget costs. A multi-objective optimization scheduling method has been proposed based on this resource cost model. This method considers the makespan and the user's budget costs as constraints of the optimization problem, achieving multi-objective optimization of both performance and cost. An improved ant colony algorithm has been proposed to solve this problem. Two constraint functions were used to evaluate and provide feedback regarding the performance and budget cost. These two constraint functions made the algorithm adjust the quality of the solution in a timely manner based on feedback in order to achieve the optimal solution. Some simulation experiments were designed to evaluate this method's performance using four metrics: 1) the makespan; 2) cost; 3) deadline violation rate; and 4) resource utilization. Experimental results show that based on these four metrics, a multi-objective optimization method is better than other similar methods, especially as it increased 56.6% in the best case scenario

University of Lincoln Institutional Repository

Crossref

An Effective News Recommendation Method for Microblog User

Author: Jinchao He
Shoubin Dong
Wanrong Gu
Zhizhao Zeng
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Recommending news stories to users, based on their preferences, has long been a favourite domain for recommender systems research. Traditional systems strive to satisfy their user by tracing users' reading history and choosing the proper candidate news articles to recommend. However, most of news websites hardly require any user to register before reading news. Besides, the latent relations between news and microblog, the popularity of particular news, and the news organization are not addressed or solved efficiently in previous approaches. In order to solve these issues, we propose an effective personalized news recommendation method based on microblog user profile building and sub class popularity prediction, in which we propose a news organization method using hybrid classification and clustering, implement a sub class popularity prediction method, and construct user profile according to our actual situation. We had designed several experiments compared to the state-of-the-art approaches on a real world dataset, and the experimental results demonstrate that our system significantly improves the accuracy and diversity in mass text data

Crossref

Directory of Open Access Journals

PubMed Central

Deep Learning Overloaded Vehicle Identification for Long Span Bridges Based on Structural Health Monitoring Data

Author: Dong Shoubin
Li Yuqin
Liu Jun
Liu Zejia
Tang Liqun
Zhong Shengliang
Zhou Licheng
Publication venue
Publication date: 04/09/2023
Field of study

Overloaded vehicles bring great harm to transportation infrastructures. BWIM (bridge weigh-in-motion) method for overloaded vehicle identification is getting more popular because it can be implemented without interruption to the traffic. However, its application is still limited because its effectiveness largely depends on professional knowledge and extra information, and is susceptible to occurrence of multiple vehicles. In this paper, a deep learning based overloaded vehicle identification approach (DOVI) is proposed, with the purpose of overloaded vehicle identification for long-span bridges by the use of structural health monitoring data. The proposed DOVI model uses temporal convolutional architectures to extract the spatial and temporal features of the input sequence data, thus provides an end-to-end overloaded vehicle identification solution which neither needs the influence line nor needs to obtain velocity and wheelbase information in advance and can be applied under the occurrence of multiple vehicles. Model evaluations are conducted on a simply supported beam and a long-span cable-stayed bridge under random traffic flow. Results demonstrate that the proposed deep-learning overloaded vehicle identification approach has better effectiveness and robustness, compared with other machine learning and deep learning approaches

arXiv.org e-Print Archive

PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead

Author: Cheng Liu
Lingqi Zhang
Shoubin Dong
Publication venue: 'MDPI AG'
Publication date: 04/11/2019
Field of study

(1) Background: DNA sequence alignment process is an essential step in genome analysis. BWA-MEM has been a prevalent single-node tool in genome alignment because of its high speed and accuracy. The exponentially generated genome data requiring a multi-node solution to handle large volumes of data currently remains a challenge. Spark is a ubiquitous big data platform that has been exploited to assist genome alignment in handling this challenge. Nonetheless, existing works that utilize Spark to optimize BWA-MEM suffer from higher overhead. (2) Methods: In this paper, we presented PipeMEM, a framework to accelerate BWA-MEM with lower overhead with the help of the pipe operation in Spark. We additionally proposed to use a pipeline structure and in-memory-computation to accelerate PipeMEM. (3) Results: Our experiments showed that, on paired-end alignment tasks, our framework had low overhead. In a multi-node environment, our framework, on average, was 2.27× faster compared with BWASpark (an alignment tool in Genome Analysis Toolkit (GATK)), and 2.33× faster compared with SparkBWA. (4) Conclusions: PipeMEM could accelerate BWA-MEM in the Spark environment with high performance and low overhead

Multidisciplinary Digital Publishing Institute

ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark

Author: Anghong Xiao
Shoubin Dong
Zongze Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2019
Field of study

Abstract Background The advance of next generation sequencing enables higher throughput with lower price, and as the basic of high-throughput sequencing data analysis, variant calling is widely used in disease research, clinical treatment and medicine research. However, current mainstream variant caller tools have a serious problem of computation bottlenecks, resulting in some long tail tasks when performing on large datasets. This prevents high scalability on clusters of multi-node and multi-core, and leads to long runtime and inefficient usage of computing resources. Thus, a high scalable tool which could run in distributed environment will be highly useful to accelerate variant calling on large scale genome data. Results In this paper, we present ADS-HCSpark, a scalable tool for variant calling based on Apache Spark framework. ADS-HCSpark accelerates the process of variant calling by implementing the parallelization of mainstream GATK HaplotypeCaller algorithm on multi-core and multi-node. Aiming at solving the problem of computation skew in HaplotypeCaller, a parallel strategy of adaptive data segmentation is proposed and a variant calling algorithm based on adaptive data segmentation is implemented, which achieves good scalability on both single-node and multi-node. For the requirement that adjacent data blocks should have overlapped boundaries, Hadoop-BAM library is customized to implement partitioning BAM file into overlapped blocks, further improving the accuracy of variant calling. Conclusions ADS-HCSpark is a scalable tool to achieve variant calling based on Apache Spark framework, implementing the parallelization of GATK HaplotypeCaller algorithm. ADS-HCSpark is evaluated on our cluster and in the case of best performance that could be achieved in this experimental platform, ADS-HCSpark is 74% faster than GATK3.8 HaplotypeCaller on single-node experiments, 57% faster than GATK4.0 HaplotypeCallerSpark and 27% faster than SparkGA on multi-node experiments, with better scalability and the accuracy of over 99%. The source code of ADS-HCSpark is publicly available at https://github.com/SCUT-CCNL/ADS-HCSpark.git

Directory of Open Access Journals

iBGP: A Bipartite Graph Propagation Approach for Mobile Advertising Fraud Detection

Author: Jinlong Hu
Junjie Liang
Shoubin Dong
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Online mobile advertising plays a vital financial role in supporting free mobile apps, but detecting malicious apps publishers who generate fraudulent actions on the advertisements hosted on their apps is difficult, since fraudulent traffic often mimics behaviors of legitimate users and evolves rapidly. In this paper, we propose a novel bipartite graph-based propagation approach, iBGP, for mobile apps advertising fraud detection in large advertising system. We exploit the characteristics of mobile advertising user’s behavior and identify two persistent patterns: power law distribution and pertinence and propose an automatic initial score learning algorithm to formulate both concepts to learn the initial scores of non-seed nodes. We propose a weighted graph propagation algorithm to propagate the scores of all nodes in the user-app bipartite graphs until convergence. To extend our approach for large-scale settings, we decompose the objective function of the initial score learning model into separate one-dimensional problems and parallelize the whole approach on an Apache Spark cluster. iBGP was applied on a large synthetic dataset and a large real-world mobile advertising dataset; experiment results demonstrate that iBGP significantly outperforms other popular graph-based propagation methods

Directory of Open Access Journals

Dynamically weighted load evaluation method based on self-adaptive threshold in cloud computing

Author: Dong Shoubin
SHU Lei
Zhou Zhangbing
Zhu Chunsheng
Zuo Liyun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2017
Field of study

International audienceCloud resources and their loads possess dynamic characteristics. Current research methods have utilized certain physical indicators and fixed thresholds to evaluate cloud resources, which cannot meet the dynamic needs of cloud resources or accurately reflect their resource states. To address this challenge, this paper proposes a Self-adaptive threshold based Dynamically Weighted load evaluation Method (termed SDWM). It evaluates the load state of the resource through a dynamically weighted evaluation method. First, the work proposes some dynamic evaluation indicators in order to evaluate the resource state more accurately. Second, SDWM divided the resource load into three states, including Overload, Normal and Idle using the self-adaptive threshold. It then migrated those overload resources to a balance load, and releases the idle resources whose idle times exceeded a threshold to save energy, which could effectively improve system utilization. Finally, SDWM leveraged an energy evaluation model to describe energy quantitatively using the migration amount of the resource request. The parameters of the energy model were obtained from a linear regression model according to the actual experimental environment. Experimental results showed that SDWM is superior to other methods in energy conservation, task response time, and resource utilization, and the improvements are 31.5 %, 50 %, 50.8 %, respectively. These results demonstrate the positive effect of the dynamic self-adaptive threshold. More specially, SDWM shows great adaptability when resources dynamically join or exi

HAL Descartes

Hal-Diderot